1 Climate change and temperature anomalies

We are analysing a dataset from NASA’s Goddard Institute for Space Studies to study the effects of climate change in the Northern Hemisphere. Glimpsing at the data, we see there are 19 variables and 143 observations, representing the period between 1880-2022:

weather <- 
  read_csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.csv", 
           skip = 1, 
           na = "***")
glimpse(weather)
## Rows: 143
## Columns: 19
## $ Year  <dbl> 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890…
## $ Jan   <dbl> -0.39, -0.30, 0.26, -0.58, -0.16, -1.01, -0.75, -1.08, -0.49, -0…
## $ Feb   <dbl> -0.53, -0.24, 0.21, -0.66, -0.11, -0.45, -0.84, -0.71, -0.61, 0.…
## $ Mar   <dbl> -0.23, -0.05, 0.02, -0.15, -0.64, -0.23, -0.71, -0.44, -0.64, -0…
## $ Apr   <dbl> -0.30, -0.02, -0.30, -0.30, -0.59, -0.49, -0.37, -0.38, -0.22, 0…
## $ May   <dbl> -0.05, 0.05, -0.23, -0.25, -0.36, -0.58, -0.34, -0.25, -0.15, -0…
## $ Jun   <dbl> -0.18, -0.33, -0.28, -0.11, -0.41, -0.45, -0.37, -0.20, -0.03, -…
## $ Jul   <dbl> -0.21, 0.10, -0.28, -0.05, -0.41, -0.34, -0.14, -0.24, 0.00, -0.…
## $ Aug   <dbl> -0.25, -0.04, -0.14, -0.22, -0.51, -0.41, -0.43, -0.54, -0.21, -…
## $ Sep   <dbl> -0.24, -0.28, -0.24, -0.34, -0.45, -0.40, -0.33, -0.21, -0.20, -…
## $ Oct   <dbl> -0.30, -0.44, -0.51, -0.16, -0.44, -0.37, -0.31, -0.49, -0.03, -…
## $ Nov   <dbl> -0.43, -0.36, -0.33, -0.44, -0.57, -0.38, -0.40, -0.27, -0.01, -…
## $ Dec   <dbl> -0.42, -0.23, -0.68, -0.15, -0.47, -0.11, -0.22, -0.43, -0.24, -…
## $ `J-D` <dbl> -0.29, -0.18, -0.21, -0.29, -0.43, -0.43, -0.43, -0.44, -0.24, -…
## $ `D-N` <dbl> NA, -0.19, -0.17, -0.33, -0.40, -0.46, -0.42, -0.42, -0.25, -0.1…
## $ DJF   <dbl> NA, -0.32, 0.08, -0.64, -0.14, -0.64, -0.57, -0.67, -0.51, -0.08…
## $ MAM   <dbl> -0.19, -0.01, -0.17, -0.24, -0.53, -0.43, -0.47, -0.36, -0.33, 0…
## $ JJA   <dbl> -0.21, -0.09, -0.23, -0.13, -0.44, -0.40, -0.31, -0.33, -0.08, -…
## $ SON   <dbl> -0.32, -0.36, -0.36, -0.31, -0.49, -0.38, -0.35, -0.32, -0.08, -…

For the purpose of our analysis, we have decided to select only data pertaining to temperature deviation (delta) by month, and manipulate the dataframe to facilitate further investigation:

tidyweather <- select(weather, 1:13) %>% 
  pivot_longer(!Year, names_to = 'month', values_to = 'delta')
head(tidyweather)
## # A tibble: 6 × 3
##    Year month delta
##   <dbl> <chr> <dbl>
## 1  1880 Jan   -0.39
## 2  1880 Feb   -0.53
## 3  1880 Mar   -0.23
## 4  1880 Apr   -0.3 
## 5  1880 May   -0.05
## 6  1880 Jun   -0.18

1.1 Plotting Information

First, we are plotting a scatter plot to visualize the evolution of delta (temperature deviation) over time:

tidyweather <- tidyweather %>%
  mutate(date = ymd(paste(as.character(Year), month, "1")),
         month = month(date, label=TRUE),
         year = year(date))

gg <- ggplot(tidyweather, aes(x=date, y = delta))+
  geom_point()+
  geom_smooth(color="red") +
  theme_bw() +
  labs (
    title = "Weather Anomalies"
  )

#plot_gg(gg,multicore=TRUE,width=5,height=5,scale=250)

Adding a line of best fit to the scatterplot clearly shows that, while deltas were close to 0 between approximately 1935-1975 and negative before that time, temperature in the Northern Hempishere has been quickly increasing since then. Notice that the rate of the increase has been increasing as well (as signified by increasing deltas).

Next, we will use facet_wrap() to visualize the effects of increasing temperature by month:

We can see that the effect of rising temperature in the Northern Hemisphere is common to all months of the year.

As a means to further investigate the effects of climate change, we will partition the data into time periods, particularly decades. To that end, we will use case_when():

comparison <- tidyweather %>% 
  filter(Year>= 1881) %>%     #remove years prior to 1881
  #create new variable 'interval', and assign values based on criteria below:
  mutate(interval = case_when(
    Year %in% c(1881:1920) ~ "1881-1920",
    Year %in% c(1921:1950) ~ "1921-1950",
    Year %in% c(1951:1980) ~ "1951-1980",
    Year %in% c(1981:2010) ~ "1981-2010",
    TRUE ~ "2011-present"
  ))

In order to study the effects of climate change by decade, we will produce a density plot to investigate the distribution of monthly temperature deviations by the time periods selected above:

ggplot(comparison) +
  aes(x = delta, fill = interval)+
  #facet_wrap(~month)+
  geom_density(alpha = 0.2) 

The plot clearly shows that with the passage of time, deltas have consistently moved to the right hand side of the plot. In other words, temperature deviations have been increasing over time.

Lastly, we will also consider annual anomalies by grouping the monthly data and producing a scatterplot:

#creating yearly averages
average_annual_anomaly <- tidyweather %>% 
  group_by(Year) %>%   #grouping data by Year
  
  # creating summaries for mean delta 
  # use `na.rm=TRUE` to eliminate NA (not available) values 
  summarise(yearly_mean = mean(delta, na.rm=TRUE)) 
  
average_annual_anomaly
## # A tibble: 143 × 2
##     Year yearly_mean
##    <dbl>       <dbl>
##  1  1880      -0.294
##  2  1881      -0.178
##  3  1882      -0.208
##  4  1883      -0.284
##  5  1884      -0.427
##  6  1885      -0.435
##  7  1886      -0.434
##  8  1887      -0.437
##  9  1888      -0.236
## 10  1889      -0.177
## # … with 133 more rows
#plotting the data
#Fit the best fit line, using LOESS method
ggplot(average_annual_anomaly) +
  aes(x = Year, y = yearly_mean)+
  geom_point()+
  geom_smooth(method = 'lm') +
  theme_bw()

The plot proves that annual temprature deltas have been increasing over time.

1.2 Confidence Interval for delta

We will now focus on the time period between 2011-present. We divert our attention towards producing a confidence interval for the average annual deltas calculated since 2011. We will use both the statistical method and bootstrap simulation to have more confidence in the results:

#statistical method
formula_ci <- comparison %>% 
  filter(interval == '2011-present') %>% 
  group_by(year) %>% 
  summarise(avg_annual_delta = mean(delta),
            sd_delta = sd(delta),
            count = n(),
            SE = sd(delta/sqrt(count)),
            upper_ci = avg_annual_delta + 2*SE,
            lower_ci = avg_annual_delta - 2*SE)

#print out formula_CI
formula_ci
## # A tibble: 12 × 7
##     year avg_annual_delta sd_delta count      SE upper_ci lower_ci
##    <dbl>            <dbl>    <dbl> <int>   <dbl>    <dbl>    <dbl>
##  1  2011            0.745    0.113    12  0.0327    0.810    0.680
##  2  2012            0.815    0.179    12  0.0517    0.918    0.712
##  3  2013            0.8      0.118    12  0.0340    0.868    0.732
##  4  2014            0.92     0.145    12  0.0420    1.00     0.836
##  5  2015            1.18     0.178    12  0.0515    1.28     1.07 
##  6  2016            1.31     0.333    12  0.0961    1.50     1.12 
##  7  2017            1.18     0.226    12  0.0653    1.31     1.05 
##  8  2018            1.04     0.137    12  0.0396    1.12     0.957
##  9  2019            1.21     0.153    12  0.0441    1.30     1.12 
## 10  2020            1.35     0.225    12  0.0648    1.48     1.22 
## 11  2021            1.14     0.117    12  0.0339    1.20     1.07 
## 12  2022           NA       NA        12 NA        NA       NA
#bootstrap simulation  

set.seed(1234)
bootstrap <- comparison %>% 
  filter(interval == '2011-present') %>% 
  specify(response = delta) %>% 
  generate(type = 'bootstrap') %>% 
  calculate(stat = 'mean') 

bootstrap_ci <- bootstrap %>% get_confidence_interval(level = 0.95)

#print out confidence intervals
bootstrap_ci
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1     1.09     1.09
#visualise
visualise(bootstrap)

library(gapminder)
library(gganimate)
library(png)
library(gifski)

#skimr::skim(gapminder)

# Use the gapminder dataset in ggplot
ggplot(data=gapminder,
       aes(x=gdpPercap, y=lifeExp, size=pop, color=country)) +
  # Add a point geom
 geom_point(alpha=0.7, show.legend=FALSE) +
# Add some manual scaling and facets 
 scale_colour_manual(values=country_colors) +
 scale_size(range=c(2, 12)) +
 scale_x_log10() +
 facet_wrap(~continent, nrow=1) + 
# Animate figure with gganimate package
 transition_time(year) +
 ease_aes('linear') +
 labs(title='Year: {frame_time}', 
      x='GDP per capita', 
      y='Life expectancy')

Looking at the results of the analysis, we can see that the statistical method produces wider confidence intervals for temperature deltas, ranging from 0.13 to approximately 0.3 in width. This is probably due to the low number of observations (12 months in each year), which prohibit a more precise calculation. On the other hand, using bootstrap simulation allows to get a much narrower confidence interval. However, both methods show that temperature deltas have been positive in the time period in question and have been consistently greater than 1 since 2015.